-
-
Notifications
You must be signed in to change notification settings - Fork 867
Improve reliability of completion submission #1711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 7299840 The changes in this PR will be included in the next version bump. This PR includes changesets to release 12 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Warning Rate limit exceeded@nicktrn has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 10 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughThis pull request updates the task completion flow by introducing an acknowledgment mechanism. The changes include new constants and a refactored method in the TaskCoordinator class to send completion acknowledgments using an exponential backoff strategy. A new socket event handler and corresponding message type for task completions with acknowledgment are added, and the messaging schema is adjusted accordingly. Additionally, the socket messaging utility now accepts an optional timeout parameter for acknowledgments, allowing for configurable retry behavior and improved error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant TC as TaskCoordinator
participant EB as ExponentialBackoff
participant SS as Socket Service
participant WH as Webapp Handler
TC->>EB: Call sendCompletionWithAck()
EB->>SS: Emit TASK_RUN_COMPLETED_WITH_ACK (with timeout/retries)
SS->>WH: Forward acknowledgment request
WH-->>SS: Return completion result
SS-->>EB: Relay acknowledgment response
EB-->>TC: Return ack result
sequenceDiagram
participant Caller as Message Sender
participant ZS as ZodSocketMessageSender
participant Socket as Socket Instance
Caller->>ZS: sendWithAck(type, payload, timeout?)
alt Timeout provided
ZS->>Socket: this.#socket.timeout(timeout)
else No timeout
ZS->>Socket: Use default socket
end
Socket->>ZS: Emit message with acknowledgment
ZS-->>Caller: Return acknowledgment result
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
apps/coordinator/src/index.ts (4)
33-37
: Consider logging or validating the parsed environment variables.The parse logic correctly falls back to defaults if the environment variables are invalid or missing. However, you might want to:
- Log the resultant values (30 seconds / 7 retries by default).
- Validate against negative or excessively large values if misconfiguration is a potential issue.
728-759
: Improve granularity of error handling and reporting.This new
sendCompletionWithAck()
function robustly handles errors and logs them. However, returning a simple boolean might limit insights into the cause of the failure. Consider returning more descriptive error information or rethrowing errors so that upstream logic can decide how to proceed.
760-797
: Factor out exponential backoff flow for clarity.The added logic in
completeWithoutCheckpoint
integrates exponential backoff for acknowledgment, which is correct and improves reliability. To keep this function focused, consider extracting the backoff steps into a dedicated helper method. This helps maintain readability and testability.
800-800
: Revisit function naming to improve maintainability.These newly introduced calls to
await completeWithoutCheckpoint(...)
unify the finalization flow. However, the function name doesn’t clearly convey theshouldExit
argument’s behavior. For better clarity, consider renaming the function or splitting the logic to reflect whether we're “exiting” or “continuing” post-completion.Also applies to: 808-808, 813-813, 818-818, 827-827, 836-836, 860-860
packages/core/src/v3/schemas/messages.ts (1)
440-440
: Consider consistent version defaults across related message types.The
TASK_RUN_COMPLETED
message defaults to "v1" whileTASK_RUN_COMPLETED_WITH_ACK
defaults to "v2". Consider using the same default version for consistency, or document why they differ.- version: z.enum(["v1", "v2"]).default("v1"), + version: z.enum(["v1", "v2"]).default("v2"),Also applies to: 427-427
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/coordinator/src/index.ts
(6 hunks)apps/webapp/app/v3/handleSocketIo.server.ts
(1 hunks)packages/core/src/v3/schemas/messages.ts
(1 hunks)packages/core/src/v3/zodSocket.ts
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: typecheck / typecheck
🔇 Additional comments (5)
apps/coordinator/src/index.ts (1)
14-14
: Validate version compatibility for new imports.Importing
ExponentialBackoff
,HttpReply
, andgetTextBody
from@trigger.dev/core/v3/apps
looks correct, but ensure that all other files and dependencies consuming these are on a compatible version to avoid runtime errors.apps/webapp/app/v3/handleSocketIo.server.ts (1)
139-176
: Good addition for acknowledgment-based completions.This new
TASK_RUN_COMPLETED_WITH_ACK
handler cleanly logs errors and returns structured success/failure responses. The approach effectively captures stack traces for debugging, which is recommended. No concerns noted.packages/core/src/v3/zodSocket.ts (1)
296-298
: Optional timeout parameter enhances robustness.Allowing an optional
timeout
insendWithAck
provides a clear mechanism for controlling acknowledgment waits. Ensure to handle edge cases (e.g., very large or negative timeouts) and confirm that callers handle potential timeout rejections gracefully.Also applies to: 311-312, 314-314
packages/core/src/v3/schemas/messages.ts (2)
438-451
: LGTM! The new message type enhances reliability through acknowledgment.The implementation correctly adds acknowledgment support for task run completion messages, which aligns with the PR's objective of improving reliability. The schema is well-structured and consistent with other message types in the file.
438-451
:✅ Verification successful
Verify the coordinator's handling of the new message type.
Let's ensure the coordinator is properly implementing this new message type with acknowledgment support.
🏁 Script executed:
#!/bin/bash # Search for implementations using the new message type rg "TASK_RUN_COMPLETED_WITH_ACK" -A 5 # Search for potential places that should be updated to use the new type rg "TASK_RUN_COMPLETED[^_]" -A 5Length of output: 18449
Coordinator’s handling of TASK_RUN_COMPLETED_WITH_ACK verified
- The new message type is correctly implemented in the coordinator (see
apps/coordinator/src/index.ts
) with appropriate timeout, retry/backoff, and error logging for acknowledgment support.- The schema update in
packages/core/src/v3/schemas/messages.ts
reflects the intended design (using version"v1"
or"v2"
along with an optional checkpoint).- Legacy references to
TASK_RUN_COMPLETED
are still in use across various modules (e.g., CLI and webapp), which appears intentional for backward compatibility. Please ensure that these legacy flows continue to operate as expected without interfering with the new ACK-based message handling.
Retries were previously driven by the run controller exclusively. This change makes it possible for the coordinator to do the same. We retry with exponential backoff, by default for up to ~2m.
Timeout and max retries can be configured with the following env vars:
TASK_RUN_COMPLETED_WITH_ACK_TIMEOUT_MS
TASK_RUN_COMPLETED_WITH_ACK_MAX_RETRIES
Summary by CodeRabbit
New Features
Refactor